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Abstract 

Compressive sensing (CS) has recently emerged as a powerful framework for acquiring sparse 
signals. The bulk of the CS literature has focused on the case where the acquired signal has 
a sparse or compressible representation in an orthonormal basis. In practice, however, there 
are many signals that cannot be sparsely represented or approximated using an orthonormal 
basis, but that do have sparse representations in a redundant dictionary. Standard results in 
CS can sometimes be extended to handle this case provided that the dictionary is sufficiently 
incoherent or well-conditioned, but these approaches fail to address the case of a truly redundant 
or ovcrcomplete dictionary. In this paper we describe a variant of the iterative reconstruction 
algorithm CoSaMP for this more challenging setting. We utilize the .D-RIP, a condition on the 
sensing matrix analogous to the well-known restricted isometry property. In contrast to prior 
work, the method and analysis are "signal-focused" ; that is, they are oriented around recovering 
the signal rather than its dictionary coefficients. Under the assumption that we have a near- 
optimal scheme for projecting vectors in signal space onto the model family of candidate sparse 
signals, we provide provable recovery guarantees. We also provide a discussion of practical 
examples and empirical results. 



1 Introduction 

1.1 Overview 

Compressive sensing (CS) is a powerful new framework for signal acquisition, offering the promise 
that we can acquire a vector x € C n via only m <C n linear measurements provided that x is 
sparse or compressible. 1 Specifically, CS considers the problem where we obtain measurements of 
the form y = Ax + e, where A is an m x n sensing matrix and e is a noise vector. If x is sparse 

"This work was partially supported by NSF grants DMS-1004718 and CCF-0830320, NSF CAREER grant CCF- 
1149225, and AFOSR grant FA9550-09-1-0465. 

1 When we say that a vector z is fc-sparse, we mean that ||z|| = f |supp(;s)j < k <JC n. A compressible vector is one 
that is well-approximated as being sparse. We discuss compressibility in greater detail in Section 2.2. 



or compressible and A satisfies certain conditions, then CS provides a mechanism to recover the 
signal x from the measurement vector y efficiently and robustly. 

Typically, however, signals of practical interest are not themselves sparse, but rather have a 
sparse expansion in some dictionary D. By this we mean that there exists a sparse coefficient 
vector a such that the signal x can be expressed as x = Da. One could then ask the simple 
question: How can we account for this signal model in CS? In some cases, there is a natural way to 
extend the standard CS formulation — since we can write the measurements as y = ADa + e we can 
use standard CS techniques to first obtain an estimate S of the sparse coefficient vector. We can 
then synthesize an estimate x = Da of the original signal. Unfortunately, this is a rather restrictive 
way to proceed for two main reasons: (i) the application of standard CS results to this problem 
will require that the matrix given by the product AD satisfy certain properties that will not be 
satisfied for many interesting choices of D, and (ii) we are not really interested in recovering a per 
se, but rather in obtaining an accurate estimate of x. If the dictionary D is poorly conditioned, 
the signal space recovery error \\x — x\\ 2 could be significantly smaller or larger than the coefficient 
space recovery error \\a — a\\ 2 . It may be possible to recover x in situations where recovering a 
is impossible, and even if we could apply standard CS results to ensure that our estimate of a is 
accurate, this would not necessarily translate into a recovery guarantee for x. 

In this paper we will consider an alternative approach to this problem and develop an algorithm 
for which we can provide guarantees on the recovery of x while making no direct assumptions 
concerning our choice of D. Before we describe our approach, however, it will be illuminating to 
see precisely what goes wrong in an attempt to extend the standard CS formulation. Towards this 
end, let us return to the case where x is itself sparse (when D = I). In this setting, there are many 
possible algorithms that have been proposed for recovering an estimate of x from measurements of 
the form y = Ax + e, including ^-minimization approaches [5, 11] and greedy /iterative methods 
such as iterative hard thresholding (IHT) [4], orthogonal matching pursuit (OMP) [15,19,25] and 
compressive sampling matching pursuit (CoSaMP) [18]. For any of these algorithms, it can be 
shown (see [4, 5, 9, 18]) that x can be accurately recovered from the measurements y if the matrix 
A satisfies a condition introduced in [8] known as the restricted isometry property (RIP). We say 
that A satisfies the RIP of order k if there exists a constant 5 k G (0, 1) such that 

VT^T k < < VTT6~ k (i) 

\\ x \\2 

holds for all x satisfying \\x\\ < k. Importantly, if A is generated randomly with a number of 
rows roughly proportional to the sparsity level k then with high probability A will satisfy the 
RIP [2,16,21]. 

We now return to the case where x is sparse with respect to a dictionary D. If D is unitary 
(i.e., if the dictionary is an orthonormal basis), then the same arguments used to establish (1) can 
be adapted to show that for random A, with high probability AD will satisfy the RIP, i.e., 

/ — ||A.Da:|L / — , 

Vi - h < " „ „ 112 < vTTT k (2) 
ll«ll 2 

will hold for all a satisfying ||c»!|| < k. Thus, standard CS algorithms can be used to accurately 
recover a, and because D is unitary, the signal space recovery error \\x — x\\ 2 will exactly equal 
the coefficient space recovery error \\a — S|| 2 . 

Unfortunately, this approach won't do in cases where D is not unitary and especially in cases 
where D is highly redundant/overcomplete. There are several reasons for this: 
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• Redundancy in D will mean that in general, the representation of a vector x in the dictio- 
nary is not unique — there may exist many possible coefficient vectors a that can be used to 
synthesize x. 

• Coherence (correlations) between the columns of D can make it difficult to satisfy (2). (One 
essentially needs D itself to satisfy the RIP in order to expect AD to satisfy the RIP.) 

• As noted above, if the dictionary D is poorly conditioned, the signal space recovery error 
|| x — x || 2 could differ substantially from the coefficient space recovery error \\a — S|| 2 , further 
complicating any attempt to understand how well we can recover x by appealing to results 
concerning the recovery of a.. 

All of these problems essentially stem from the fact that extending standard CS algorithms in an 
attempt to recover ex is a coefficient-focused recovery strategy. By trying to go from the measure- 
ments y all the way back to the coefficient vector a, one encounters all the problems above due to 
the lack of orthogonality of the dictionary. 

In contrast, in this paper we propose a signal-focused recovery strategy for CS. Our algorithm 
employs the model of sparsity in an arbitrary dictionary D but directly obtains an estimate of 
the signal x, and we provide guarantees on the quality of this estimate in signal space. Our 
bounds require only that A satisfy the D-RIP [6] (which we explain below in Section 1.3) — this 
is a different and less-restrictive condition to satisfy than requiring AD to satisfy the RIP. Our 
algorithm is a modification of CoSaMP [18], and in cases where D is unitary, our "Signal-Space 
CoSaMP" algorithm reduces to standard CoSaMP. 

1.2 Related Work 

Our work most closely relates to Blumensath's Projected Landweber Algorithm (PLA) [3], an 
extension of Iterative Hard Thresholding (IHT) [4] that operates in signal space and accounts for a 
union-of-subspaces signal model. In several ways, our work is a parallel of this one, except that we 
extend CoSaMP rather than IHT to operate in signal space. Both works assume that A satisfies the 
.D-RIP, and implementing both algorithms requires the ability to compute projections of vectors 
in the signal space onto the model family. (These requirements are described more thoroughly in 
Section 1.3 below.) One difference, however, is that our analysis allows for near-optimal projections 
whereas the PLA analysis does not. 2 Other fine differences are noted below. 

Also related are works that employ an assumption of "analysis sparsity," in which a signal x is 
analyzed in a dictionary D, and recovery from CS measurements is possible if D*x is sparse or com- 
pressible. Conventional CS algorithms such as ^-minimization [6, 17], IHT [12], and CoSaMP [12] 
have been adapted to account for analysis sparsity. These works are similar to ours in that they 
provide recovery guarantees in signal space and do not require AD to satisfy the RIP. However, the 
assumption of analysis sparsity is in general different from the "synthesis sparsity" that we assume, 
where there exists a sparse coefficient vector a such that x = Da. Therefore, these algorithms are 
intended for different signal families and potentially different dictionaries. Nevertheless, there are 
some similarities between our work and analysis CoSaMP (ACoSaMP) [12] as we will see below. 

Finally, it is worth mentioning the loose connection between our work and that in "model- 
based CS" [1]. In the case where D is unitary, IHT and CoSaMP have been modified to account 
for structured sparsity models, in which certain sparsity patterns in the coefficient vector a are 

Technically, the analysis of the PLA [3] allows for near-optimal projections but only with an additive error term. 
For sparse models, however, such projections could be made arbitrarily accurate simply by rescaling the signal before 
projecting. Thus, we consider the PLA to require exact projections for sparse models. 
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forbidden. This work is similar to ours in that it involves a projection onto a model set. However, 
the algorithm (including the projection) operates in coefficient space (not signal space) and employs 
a different signal model; the requisite model-based RIP is more similar to requiring that AD satisfy 
the RIP; and extensions to non-orthogonal dictionaries are not discussed. In fact, our work is in part 
inspired by our recent efforts [10] to extend the "model-based CS" framework to a non-orthogonal 
dictionary in which we proposed a similar algorithm to the one considered in this paper. 

1.3 Requirements 

First, to establish notation, suppose that A is an m x n matrix and D is an arbitrary nx d matrix. 
We suppose that we observe measurements of the form y = Ax + e = AD a + e. For an index set 
A G {1, 2, . . . , d} (sometimes referred to as a support set), we let D\ denote the n x |A| submatrix 
of D corresponding to the columns indexed by A, and we let TZ(D\) denote the column span of 
D\. We also use Va : C n — > C n to denote the orthogonal projection operator onto TZ(D\) and 
V^± : C n — > C n to denote the orthogonal projection operator onto the orthogonal complement of 

We will approach our analysis under the assumption that the matrix A satisfies the D-RIP [6] . 
Specifically, we say that A satisfies the .D-RIP of order k if there exists a constant 5k G (0, 1) such 
that 

/ — \\ADat\\ i — . , 

\\ Da h 

holds for all a satisfying ||o:||q < k. We note that this is different from requiring that A satisfy the 
RIP — although (1) and (3) appear similar, the RIP requirement demands that this condition holds 
for vectors x containing few nonzeros, while the D-RIP requirement demands that this condition 
holds for vectors x having a sparse representation in the dictionary D. We also note that, compared 
to the requirement that AD satisfy the RIP (2), it is relatively easy to ensure that A satisfies the 
D-RIP. In particular, for any choice of D, if A is populated with independent and identically 
distributed (i.i.d.) random entries from a Gaussian or subgaussian distribution, then with high 
probability, A will satisfy the D-RIP of order k as long as m = 0(klog(d/k)) [10, Corollary 3.1]. 
In fact, given any matrix A satisfying the traditional RIP, by applying a random sign matrix one 
obtains a matrix satisfying the D-RIP [14] . 

Next, recall that one of the key steps in the traditional CoSaMP algorithm is to project a vector 
in signal space onto the model family of candidate sparse signals. In the traditional setting (when 
D is an orthonormal basis), this step is trivial and can be performed by simple thresholding of the 
entries of the coefficient vector. Our Signal Space CoSaMP algorithm (described more completely 
in Section 2) involves replacing thresholding with a more general projection of vectors in the signal 
space onto the signal model. Specifically, for a given vector z G C n and a given sparsity level k, 
define 

A opt (z, k) := arg min \\z - Vaz\\ 2 . 

A:\A\=k 

The support A opt (z, k) — if we could compute it — could be used to generate the best /c-sparse ap- 
proximation to z; in particular, the nearest neighbor to z among all signals that can be synthesized 
using k columns from D is given by 7 ? A opt (z,fc) z - Unfortunately, computing A opt (z, k) may be diffi- 
cult in general. Therefore, we allow for near-optimal projections to be used in our algorithm. For 
a given vector z G C n and a given sparsity level k, we assume a method is available for producing 

3 Note that V A ± does not represent the orthogonal projection operator onto 7?.(U{i > 2,...,d}\A)- 
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Algorithm 1 Signal Space CoSaMP 



input: A, D, y, k, stopping criterion 
initialize: r = y, x° = 0, I = 0, T = 
while not converged do 

proxy: h = A*r 

identify: Q = So(h, 2k) 

merge: T = Q U V 

update: x = arg min z \\y — Az\\ 2 s.t. z G TZ(Dt) 

r = s D (x,k) 
x i+i = Vv x 

r = y — Ax e+1 

e = £ + i 

end while 
output: x = x £ 



an estimate of A opt (z, k), denoted Sd(z, k) and having cardinality \Sr>{z, k)\ = k, that satisfies 

\\Ph ovt (z,k)Z - Vs D (z,k)z\\ 2 < min (ei \\V Aopt{Z:k) z\\ 2 ,e 2 \\z- V Kopt(Zyk) z\\^j (4) 

for some constants ei,£2 > 0. Setting e\ or e 2 equal to would lead to the requirement that 
VA opt (z,k) z = Vs D {z,k) z exactly. Note that our metric for judging the quality of an approximation 
to A opt (z, k) is entirely in terms of its impact in signal space. It might well be the case that Sr>(z, k) 
could satisfy (4) while being substantially different (or even disjoint) from A op t(z, k). 

It is important to note that computing an optimal or even near-optimal support estimate that 
satisfies the condition (4) remains a challenging task in general. Some important related works, 
however, have run into the same problem. As we previously mentioned, the existing analysis of the 
PLA [3] actually requires exact computation of A op t(z, k). The analysis of ACoSaMP [12] allows a 
near-optimal projection to be used, with a near-optimality criterion that differs slightly from ours. 
Simulations of ACoSaMP, however, have relied on practical (but not theoretically backed) methods 
for computing this projection. In Section 3, we present simulation results for Signal Space CoSaMP 
using practical (but not theoretically backed) methods for computing SD(z,k). We believe that 
computing provably near-optimal projections is an important topic worthy of further study, as it 
is really the crux of the problem in these settings. 



2 Algorithm and Recovery Guarantees 

Given noisy compressive measurements of the form y = Ax+e, our Signal Space CoSaMP algorithm 
for recovering an estimate of the signal x is specified in Algorithm 1. 



2.1 A Bound for the Recovery of Sparse Signals 

For signals having a sparse representation in the dictionary D, we have the following guarantee. 

Theorem 2.1. Suppose there exists a k-sparse coefficient vector a such that x = Da, and suppose 
that A satisfies the D-RIP of order 4k. Then the signal estimate x e+1 obtained after £+1 iterations 
of Signal Space CoSaMP satisfies 



X — X 




x — x e 




2 





+ c 2 ||e|| 2 , 



(5) 
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where C\ and C2 are constants that depend on the isometry constant 5^ and on the approximation 
parameters e\ and €2- In particular, 

c 1 =«2 +£1 fe + . 1 )(2 + , 2 )^ w ft=( (2+ - )((2 ; 1 ^ + * t)+a) )- 

Our proof of Theorem 2.1 appears in Appendix A and is a modification of the original CoSaMP 
proof [18]. Through various combinations of e\, 62, and 84k, it is possible to ensure that C\ < 1 
and thus that the accuracy of Signal Space CoSaMP improves at each iteration. Taking €1 = jq, 
€2 = 1, and 64k = 0.029 as an example, we obtain C\ < 0.5 and C2 < 12.7. Applying the relation 
(5) recursively, we then conclude the following. 

Corollary 2.1. Suppose there exists a k-sparse coefficient vector a such that x = Da, and suppose 
that A satisfies the D-RIP of order \k with 64k = 0.029. Suppose that Signal Space CoSaMP is 
implemented using near-optimal projections with approximation parameters £1 = jq and £2 = 1. 
Then the signal estimate x e obtained after £ iterations of Signal Space CoSaMP satisfies 



x — x 



< 2^|H| 2 + 25.4||e|| 2 . (6) 



By taking a sufficient number of iterations £, the first term on the right hand side of (6) can be 
made arbitrarily small, and ultimately the recovery error depends only on the level of noise in the 
measurements. For sparse signal recovery, this result is fully in line with state-of-the-art bounds 
for traditional CS algorithms such as CoSaMP [18], except that it can be applied in settings where 
the dictionary D is not unitary (supposing the ability to compute the near-optimal projections). 

2.2 A Discussion Concerning the Recovery of Arbitrary Signals 

We can extend our analysis to account for signals x that do not exactly have a sparse representation 
in the dictionary D. For the sake of illustration, we again take ei = ^, £2 = 1, and 64k = 0.029, 
and we show how (6) can be extended. 

We again assume measurements of the form y = Ax + e but allow x to be an arbitrary signal 
in C n . For any vector a k € C d such that 1 1 ck fc 1 1 q < k, we can write 

y = Ax + e = ADoLk + A(x — Dak) + e. 

The term e := A(x — Da^) + e can be viewed as noise in the measurements of the fc-sparse signal 
Dak- Then by (6) we have 



Dai, - x 



< 2- e \\Da k \\ 2 + 25.4 ||e|| 2 < 7T l \\Da k \\ 2 + 25.4 (||e|| 2 + \\A(x - Da k )\\ 2 ) , (7) 



2 

and so using the triangle inequality, 

< 2~ e ||i?a fc || 2 + 25.4 ||e|| 2 + ||x - Da fc || 2 + 25.4 \\A(x - Da k )\\ 2 . (8) 



£ 

x — X 



One can then choose the coefficient vector a k to minimize the right hand side of (8). 

Bounds very similar to this appear in the analysis of both the PLA [3] and ACoSaMP [12]. 
Such results are somewhat unsatisfying since it is unclear how the term \\A(x — Da k )\\ 2 might 
behave. Unfortunately, these bounds are difficult to improve upon when D is not unitary. Under 
some additional assumptions, however, we can make further modifications to (8). For example, the 
following proposition allows us to bound \\A(x — Da k )\\ 2 . 
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Proposition 2.1 (Proposition 3.5 of [18]). Suppose that A satisfies the upper inequality of the 
RIP, i.e., that \\Ax\\ 2 < y/l + 5 k \\x\\ 2 holds for all x € C n with \\x\\ < k. Then, for every signal 

l , 



\\Az\\ 2 < \J\ + 5 k 
Plugging this result in to (8), we have 



(9) 



x — x 



< T l ||Ua fc || 2 + 25.4 ||e|| 2 + (25.4^1 + ^ + 1) \\x - Da k \\ 2 
25.4 v / T+~^ 



+ 



\x - Da k \\ l . 



(10) 



For any x € C n , one could define the model mismatch quantity 



mismatch(«) := inf 

«fc:||«fello< fc 

Plugging this definition into (10), we obtain 



\x — Da.k\\ 2 H — -j= \\x — DdkWi 



x — x 



< 2~ £ ||£>a fc || 2 + 25.4 ||e|| 2 + 26.4^1 + 5 k ■ mismatch(a;). 



(11) 



In some sense, one can view mismatch(a^) as the "distance" from x to the set of signals that are 
A;-sparse in the dictionary D, except that the actual "distance" being measured is a mixed £2/^1 
norm. In cases where one expects this distance to be small, (11) guarantees that the recovery error 
will be small. 

We close this discussion by noting that if we make the stronger assumption that AD actually 
satisfies the RIP, we can also measure the model mismatch in the coefficient space rather than the 
signal space. Let a € C d be any vector that satisfies x = Da. Then using a natural extension of 
Proposition 2.1, we conclude that 

\\A(x - Da k )\\ 2 = \\AD(a - a k )\\ 2 < V 1 + s k - <*fc|| 2 + ^ II" - a k\\^j ■ 
When a is compressible, the recovery error (8) will be reasonably small. 



3 Simulations 



As we discussed in Section 1.3, the main difficulty in implementing our algorithm is in computing 
projections of vectors in the signal space onto the model family of candidate sparse signals. One such 
projection is required in the identify step in Algorithm 1; another such projection is required in the 
update step. Although our theoretical analysis can accommodate near-optimal support estimates 
Sd(z, k) that satisfy the condition (4), computing even near-optimal supports can be a challenging 
task for many dictionaries of practical interest. In this section, we present simulation results 
using practical (but heuristic) methods for attempting to find near-optimal supports Sd(z, k). We 
see that the resulting algorithms — even though they are not quite covered by our theory — can 
nevertheless outperform classical CS reconstruction techniques. 

In all simulations that follow, we set the signal length n = 256. We let D be an n x d dictionary 
(two different dictionaries are considered below), and we construct a length-d coefficient vector a 
with k = 8 nonzero entries chosen as i.i.d. Gaussian random variables. We set x = Da, construct A 
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Figure 1: Performance in recovering signals having a k = 8 sparse representation in a dictionary D with 
orthogonal, but not normalized, columns. The plot shows, for various numbers of measurements m, the 
percent of trials in which each algorithm recovers the signal exactly. Signal Space CoSaMP (in which we can 
compute optimal projections) outperforms an unmodified CoSaMP algorithm. 

as a random mxn matrix with i.i.d. Gaussian entries, and collect noiseless measurements y = Ax. 
After reconstructing an estimate oi x, we declare this recovery to be perfect if the SNR of the 
recovered signal estimate is above 100 dB. All of our simulations were performed via a MATLAB 
software package that we have made available for download at http://users.ece.gatech.edu/ 
-mdavenport /soft ware. 

3.1 Renormalized Orthogonal Dictionary 

As an instructive warm-up, we begin with one almost trivial example where the optimal projection 
can be computed exactly: we construct D by taking an orthobasis and renormalizing its columns 
while maintaining their orthogonality. To compute A op t(z,k) with such a dictionary, one merely 
computes D*z, divides this vector elementwise by the column norms of D, and sets A equal to the 
positions of the k largest entries. 

For the sake of demonstration, we set D equal to the nxn identity matrix, but we then rescale 
its first n/2 diagonal entries to equal 100 instead of 1. We construct sparse coefficient vectors a 
as described above, with supports chosen uniformly at random. As a function of the number of 
measurements m, we plot in Figure 1 the percent of trials in which Signal Space CoSaMP recovers x 
exactly. We see that with roughly 50 or more measurements, the recovery is perfect in virtually all 
trials. In contrast, this figure also shows the performance of traditional CoSaMP using the combined 
dictionary AD to first recover a. Because of the non-normalized columns in D, CoSaMP almost 
never recovers the correct signal; in fact its support estimates almost always are contained in the 
set {1,2, . . . , n/2}. It is not surprising that traditional CoSaMP will fail if it is not modified to 
account for the various column norms in D; the point here is that our algorithm gives a principled 
way to make this (trivial) modification. 

3.2 Overcomplete DFT Dictionary 

As a second example, we set d = An and let D be a 4x overcomplete DFT dictionary. In this 
dictionary, neighboring columns are highly coherent, while distant columns are not. We consider 
two scenarios: one in which the k = 8 nonzero entries of a are randomly positioned but well- 
separated (with a minimum spacing of 8 zeros in between any pair of nonzero entries), and one in 
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Figure 2: Performance in recovering signals having a k = 8 sparse representation in a 4x overcomplete 
DFT dictionary. Two scenarios are shown: (a) one in which the k = 8 nonzero entries of a are randomly 
positioned but well-separated, and (b) one in which the k = 8 nonzero entries all cluster together in a single, 
randomly-positioned block. Algorithms involving OMP and i\-minimization perform well in the former 
scenario; algorithms involving CoSaMP perform well in the latter. In general, the Signal Space CoSaMP 
variants outperform the corresponding traditional CS algorithm. 

which the k = 8 nonzero entries all cluster together in a single, randomly-positioned block. Because 
of the nature of the columns in D, we see that many recovery algorithms perform differently in 
these two scenarios. 

3.2.1 Well-separated coefficients 

Figure 2(a) plots the performance of six different recovery algorithms for the scenario where the 
nonzero entries of a are well-separated. Two of these algorithms are the traditional OMP and 
CoSaMP algorithms from CS, each using the combined dictionary AD to first recover a. We 
actually see that OMP performs substantially better than CoSaMP in this scenario, apparently 
because it can select one coefficient at a time and is less affected by the coherence of D. It is 
somewhat remarkable that OMP succeeds at all, given that AD will not satisfy the RIP and we 
are not aware of any existing theory that would guarantee the performance of OMP in this scenario. 

We also show in Figure 2(a) two variants of Signal Space CoSaMP: one in which OMP is 
used for computing Sd{z, k) (labeled "SSCoSaMP (OMP)"), and one in which CoSaMP is used for 
computing S D (z, k) (labeled "SSCoSaMP (CoSaMP)"). That is, these algorithms actually use OMP 
or CoSaMP as an inner loop inside of Signal Space CoSaMP to find a sparse solution to the equation 
z = Da. In this scenario, we see that the performance of SSCoSaMP (OMP) is substantially better 
than OMP, while the performance of SSCoSaMP (CoSaMP) is poor. We believe that this happens 
for the same reason that traditional OMP outperforms traditional CoSaMP. In general, we have 
found that when OMP performs well, SSCoSaMP (OMP) may perform even better, and when 
CoSaMP performs poorly, SSCoSaMP (CoSaMP) may still perform poorly. 

Figure 2(a) also shows the performance of two algorithms that involve convex optimization for 
sparse regularization. One, labeled u £x" uses an ^-minimization approach [24] to find a sparse 
coefficient vector a.' subject to the constraint that y = ADa'. This algorithm actually outperforms 
traditional OMP in this scenario. The other, labeled "SSCoSaMP (ti)," is a variant of Signal 
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Space CoSaMP in which l\ -minimization is used for computing Sd(z, k). A Specifically, to compute 
So(z,k), we search for the vector a' having the smallest t\ norm subject to the constraint that 
z = Da' , and we then choose the support that contains the k largest entries of this vector. 
Remarkably, this algorithm performs best of all. We believe that this is likely due to the fact that, 
for the overcomplete DFT dictionary, .^-minimization is known to be capable of finding A op t(z, k) 
exactly when z = V^ opt ^ z ^z and the entries of A op t(z, k) are sufficiently well-separated [7]. While 
we do not guarantee that this condition will be met within every iteration of Signal Space CoSaMP, 
the fact that the original coefficient vector a has well-separated coefficients seems to be intimately 
related to the success of l\ and SSCoSaMP {t\) here. 

We must note that all of the above algorithms involve a step where a least-squares problem 
must be solved on an estimated support set. 5 When a dictionary D is highly coherent, it can be 
numerically challenging to solve this problem, as the resulting submatrix can be very poorly condi- 
tioned. Following our discussion in [10], we employ Tikhonov regularization [13, 20, 22, 23] and solve 
a norm-constrained least-squares problem to improve its conditioning. For this we must provide 
our algorithms with an upper bound on the norm of the sparse coefficient vector. In the simulations 
above, we have selected this bound to be lOx the true norm of the original ex. The selection of 
this bound does not have a substantial impact on the performance of OMP or SSCoSaMP (OMP), 
but we have noticed that CoSaMP and SSCoSaMP (CoSaMP) perform somewhat better when the 
true norm ||ct|| 2 is provided as an oracle. 

3.2.2 Clustered coefficients 

Finally, Figure 2(b) plots the performance of the same six recovery algorithms for the scenario 
where the nonzero entries of a are clustered into a single block. Although one could of course 
employ a block-sparse recovery algorithm in this scenario, our intent is more to study the impact 
that neighboring active atoms have on the algorithms above. 

In this scenario, between the traditional greedy algorithms, CoSaMP now outperforms OMP, 
apparently because it is designed to select multiple indices at each step and will not be as affected by 
the coherence of neighboring active columns in D. We also see that the performance of SSCoSaMP 
(CoSaMP) is somewhat better than CoSaMP, while the performance of SSCoSaMP (OMP) is poor. 
We believe that this happens for the same reason that traditional CoSaMP outperforms traditional 
OMP. In general, we have found that when CoSaMP performs well, SSCoSaMP (CoSaMP) may 
perform even better, and when OMP performs poorly, SSCoSaMP (OMP) may still perform poorly. 

In terms of our condition for perfect recovery (estimating x to within an SNR of 100 dB or more), 
neither of the algorithms that involve ^-minimization perform well in this scenario. However, we 
do note that both i\ and SSCoSaMP do frequently recover an estimate of x with an SNR of 
50 dB or more, though still not quite as frequently as SSCoSaMP (CoSaMP) does. 

In these simulations, we again use Tikhonov regularization with a norm bound that is 10 x the 
true norm of the original a. However, we have not found that changing this norm bound has a 
significant impact on CoSaMP or SSCoSaMP (CoSaMP) in this scenario. We close by noting that 
in this scenario we found it beneficial to run CoSaMP and the three Signal Space CoSaMP methods 
for a few more iterations than in the case of well-separated coefficients; convergence to exactly the 
correct support can be slow in this case where multiple neighboring atoms in a coherent dictionary 
are active. 

4 We are not unaware of the irony of using ^-minimization inside of a greedy algorithm. 
5 We use this for debiasing after running t\. 
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A Proof of Theorem 2.1 

The proof of Theorem 2.1 requires four main lemmas, which are listed below and proved in Sec- 
tions A.1-A.4. In the lemmas below, v = x — x e denotes the recovery error in signal space after £ 
iterations. 

Lemma A.l (Identify). H^xf II2 - ((2 + e i)$4k + e i) IMI2 + (2 + ei)\/T + 5^ ||e|| 2 . 
Lemma A. 2 (Merge). 1 1 T^j-"- 1 - 50 1 1 2 — Il'^ta- L ' y ll2- 



Lemma A.3 (Update). \\x - x\\ 2 < yj jz^ \[P T ±x\\ 2 + 

Lemma A. 4 (Estimate), jjx — a^ +1 || 2 < (2 + £2) \\x — x\\ 2 . 
Combining all four statements above, we have 
e+i 



X — X 



< (2 + e 2 )||aj-aj| 
2 



< (2 + e 2 )\ j- \\V T ±x\\ 2 + \\e\ 

V 1 - d 4k V 1 - Oik 

<r fo^ ^ l l + ^ I,-, I, 4 + 2e 2 .. . 

- (2 + e2 V^ ll ^ ll2 + 7r^ H 



1 + 5 4fc 



< (2 + e 2)t/ r3 / i ((2 + ei)5 4 fc + ei) N 



4fc 



2 



l + <*4fc, rt , x A — : — ? I, „ 4 + 2e 2 



+ (2 + £2)1/7^^(2 + ei)Vl + «54Jfe||e|| 2 + 



(2 + e 2 )((2 + ei)(l + 5 4fc ) + 2; 



1 + &tfc 
1 - 84k 



x - x l 



+ 



2 V V^5. 



Ak 



= ((2 + ei)<5 4 fc + ei)(2 + e2)^ 
This completes the proof of Theorem 2.1. 
A.l Proof of Lemma A.l 

In order to prove the four main lemmas, we require two supplemental lemmas, the first of which is 
a direct consequence of the D-RIP. 

Lemma A. 5 (Consequence of D-RIP). For any index set B and any vector z G C n , 

\\V B A*AV B z - V B z\\ 2 < 5\ B \ \\z\\ 2 . 



Proof. We have 



S\ B \ > SUp 



I A II 2 II II 2 
I 1 1 2 1 1 *^ 1 1 2 



x: \B\— sparse in D 11*^112 

\\\AVbx\\ 2 2 - \\V B x\\l\ 

> SUp — — ^5 — 

\\Vbx\\ 2 2 

\\\AV B x\\ 2 2 -\\V B x\\l\ 

> sup — z 2 — 

X J J X 1 1 2 

= sup I ||A'Pb2:|| 2 - UPb^H 2 
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sup \(V* B A*AV B x - V* B V B x,x) 

x\\ a =l 

sup \{V B A*AV B x -V B x,x)\ 

x\U=l 



\V B A*AV B -Vb\ 



2 ) 



where the third line follows because ||Pb2:|| 2 < ||x|| 2 for all x, and the last line follows from the 
fact that V B A*AV B - V B is self-adjoint. □ 

We'll also utilize an elementary fact about orthogonal projections. 

Lemma A. 6. For any pair of index sets A, B with A C B, Va = VaPb- 

Now, to make the notation simpler, let v = A* Av + A*e and note that SI = Su(v,2k). Let 
ST = A op t('U, 2k) denote the optimal support of size 2k for approximating v, and set R = Sd(v, 2k). 
Using this notation we have 



WP a ±v\ 



< 

< 
< 



\\v - Vqv\\ 2 
\\v - Vnv\\ 2 

HO - V Ru n*v) + {V Ru n*v -7>n*v) + {Va*v - Vnv)\\ 2 
\\v - V Ru n*v\\ 2 + WPrlm*v - Vn*v\\ 2 + \\7>n*v - Vnv\\ 2 

\\v - V RUQ * A*Av\\ 2 + \\r Ru n*A*e\\ 2 + \\V m ^v - Vq,*v\\ 2 + \\Vn*v - Vqv\\ 2 , (12) 



where the second line follows from the fact that Vqv is the nearest neighbor to v among all vectors 
in 1Z(Dq) and the fourth and fifth lines use the triangle inequality. 

Below, we will provide bounds on the first and second terms appearing in (12). To deal with 
the third term in (12), note that for any II which is a subset of R U Q*, we can write 

v - V u v = (v - V Ru n*v) + (V Ru n*v - Vuv), 

where v — V RU fi*v is orthogonal to TZ(D RU fi*), and V R un*v — Vnv is contained in TZ(D Ru q*). Thus 
we can write 

\\v - Vuv\\l = \\v - V Ru n*v\\l + \\V Ru n*v - Vuv\\l ■ 

Recall that over all index sets II with |LT| = 2k, \\v — Vyiv\\ 2 is minimized by choosing II = SI*. 
Thus, over all II which are subsets of RU SI* with | XX | = 2k, \\Vruq*v — Viiv\\ 2 must be minimized 
by choosing IT = $7*. In particular, we have the first inequality below: 



\\V Ru n*v - Vn*v\\ 2 < 



< 

< 
< 
< 



\\V Ru n*v - V R v\\ 2 

WPRun*v - V R V Ru n*v\\ 2 

\\V Ru n*v-r R {v + A*e)\\ 2 

\\{V RuQ *A*Av -v) + (V Ru n*A*e - V R A*e)\\ 2 

\\Vrjci. A*Av - v\\ 2 + \\P Ru n*A*e - V R A*e\\ 2 

\\V RuQ *A*Av - v\\ 2 + ||(7 - V R )V Ru n*A*e\\ 2 

\\V R un*A*Av - v\\ 2 + \\V Run * A*e\\ 2 . 



(13) 



The second line above uses Lemma A. 6, the third line follows from the fact that V R V R un*v must 
be the nearest neighbor to V Rli q*v among all vectors in 1Z(D R ), the fourth line uses the fact that 
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Vrv = v because R = Sr>{v,2k) and both x and x s ~ are fc-sparse in D, the fifth line uses the 
triangle inequality, the sixth line uses Lemma A. 6, and the seventh line follows from the fact that 
(I — Vr) is an orthogonal projection and hence has norm bounded by 1. 

To deal with the fourth term in (12), note that from the definition of Q,* and from (4), we have 
the first inequality below: 



\\Vn*v - Vdv\\ 2 < £ i 
= ei 
<ei 

< ei 
<ei 

< ei 



ll^|| 2 

\\Vn*(A*Av + A*e)\\ 2 

\\V n *A*Av\\ 2 + e 1 \\Vn*A*e\\ 2 

\\Vn*V Ru n*A*Av\\ 2 + e l \\Vn*V Ru n*A*e\\ 2 

\\r Run *A*Av\\ 2 + ei \\V R un* A*e\\ 2 

\\v\\ 2 + ei HPflun* - «|| 2 + ei H^Hun* ^4.*e|| 2 , 



(14) 



The third line above uses the triangle inequality, the fourth line uses Lemma A. 6, the fifth line uses 
the fact Vq* is an orthogonal projection and hence has norm bounded by 1, and the sixth line uses 
the triangle inequality. 

Combining (12), (13), and (14) we see that 

\\V n ±v\\ 2 < (2 + ei) \\Vbusi*A*Av - v\\ 2 + (2 + ei) \\V Run *A*e\\ 2 + a \\v\\ 2 . 

Since v G 1Z(D R ), it follows that v G TZ(D Ru q*), and so 

\\V Run *A*Av - v\\ 2 = \\V Ru n*A*AV Run *v - V Ru n*v\\ 2 < 5 Ak \\v\\ 2 , 

where we have used Lemma A. 5 to get the inequality above. In addition, we know that the operator 
norm of V R un* A* satisfies 



UQ* 



\\(P Ru n*A*)*\\ 2 = \\AP Run *\\ 2 < y/1 + S 4k , 



which follows from the D-RIP. Specifically, for any x, 



\AV Ru q*x\\ 2 \\AVru£i*x\\ 2 r—^ 

n— n < -rr=. n— < V 1 + 0. 

\\ x \\2 \\rRuii*x\\ 2 



lk- 



Putting all of this together, we have 

\\V n ±v\\ 2 < ((2 + ei)<5 4fc + ei) ||«|| 2 + (2 + e^y/l + S^ \\e\\ 2 . 

A. 2 Proof of Lemma A. 2 

First note that by the definition of T, x l G TZ(Dt), and hence V T ±x e = 0. Thus we can write, 



\V T ±x\\ 2 



Finally, since O C T, we have that 



V T ±(x - x e 



\V T ±v 



T ±v\\ 2 . 



||^ T ±-y|| 2 < \\V n ±v\\ 2 . 
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A. 3 Proof of Lemma A. 3 



To begin, we note that x — x has a 4/c-sparse representation in D, thus, applying the D-RIP (of 
order 4k) we have 



\Ax — Ax\ 

v 7 ! - 



By construction, 



|| Ax — Ax + e || 2 < || Ax — Az + e|| 2 
for any z G TZ(Dt), in particular for z = VtX. Thus, 



I ® !C||2 ^ ; 



< 
< 
< 



| .A.X .A.iXj ~\~ c 1 1 2 ~\~ ||^||2 
\Ax — AVtx + e || 2 + ||e|| 2 
I Ax - AVtx\\ 2 + 2 ||e|| 2 



v 7 ! - &tfc 

where the second and fourth lines use the triangle inequality. By applying the .D-RIP we obtain 



\Ax - AV T x\\ 2 < y/1 + 6 4k \\x - V T x\\ 2 = y/1 + S 4k \\V T ± 



x\ 



Combining all of this, 



I *^ *^ 1 1 2 — 



VI + Sik \\V T ±x\\ 2 + 2 || e I 



4fc 



A. 4 Proof of Lemma A. 4 

Using the triangle inequality, we have 



x — x 



e+i 



X — X + X — X 



e+i 



< \\x — £C|L + 



X — X 



e+i 



Recall that T = Sd{x, k) and x e+1 = Vrx. Let T* = A opt (x, k) denote the optimal support of size 
k for approximating x. Then we can write 



x — x 



e+i 



< \\x — Vr*x\\ 2 + llPr*^ - T^r 32 1 1 2 

< \\x — 'Pr*5|| 2 + £2 \\x — Vr*x\\ 2 , 

where the first line follows from the triangle inequality, and the second line uses (4). Combining 
all of this, we have 



x — x 



e+i 



< \\x - x\\ 2 + (1 + €2) \\x - Vt*x\\ 2 

< \\x - x\\ 2 + (1 + e 2 ) \\x - x\\ 2 
= (2 + e 2 ) ||aj - x|| 2 , 

where the second line follows from the fact that Vr* x is the nearest neighbor to x among all vectors 
having a fc-sparse representation in D. 
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